#entropy clipping24/08/2025
Prefix-RFT: Guiding LLMs with Partial Demonstrations to Merge SFT and RFT
Prefix-RFT blends supervised and reinforcement fine-tuning by using partial demonstration prefixes to guide exploration, achieving stronger and more stable performance on math reasoning benchmarks than SFT, RFT, and hybrid baselines.